WRANGLERS

Missing Migrants

Group

Photo by Nikko Macaspac on Unsplash

Photo by Nikko Macaspac on Unsplash

Man is not the sum of what he has already, but rather the sum of what he does not yet have, of what he could have….
— Jean-Paul Sartre


Load Data

We will group this data into different parts, and apply function/s to each of them, such as sum, mean, median, max and min. Since we have some missing valueS, let’s also replace “NA” data with 0 first (we will revisit this pattern later in the course).

df <- read_csv('./archetypes/missing-migrants/MissingMigrants-Global-2020-10-18T06-37-33.csv')

df <- df %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0))

df

Group one variable with one calculation

First, lets create a simle Group by on “Reported Year” and calculate the sum of the “Total Dead and Missing” persons.

df_1 <- df %>% 
  group_by(`Reported Year`) %>% 
  summarise(sum_number=sum(`Total Dead and Missing`))

df_1

Group more than one variable with one calculation

Let’s now group by “Reported Year” and “Reported Month”, calculating the sum of the “Total Dead and Missing”.

df_2 <- df %>% 
  group_by(`Reported Year`, `Reported Month`) %>% 
  summarise(sum_number=sum(`Total Dead and Missing`))

df_2

Notice that the data is presented vertically, with only 3 columns, and each row representing one observation (here the sum for a specific year and month. This is called the “Long data format”. This is the best format to run the chart below.

Time to plot

df_2$`Reported Month` <- factor(df_2$`Reported Month`, levels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
df_2$`Reported Year` <- factor(df_2$`Reported Year`, levels=c('2014','2015','2016','2017','2018','2019','2020'))
year_palette <- c("2014" = "#E9EAF6", "2015" = "#C5C8E0", "2016" = "#959DCA", "2017" = "#4764A8", "2018" = "#6F7DB6",
                  "2019" = "#33B5B3", "2020" = "#C5C8E0")


v1 <- ggplot(df_2,aes(x=`Reported Month`, y=sum_number, fill=`Reported Year`)) +
  geom_hline(yintercept = 0, size=0.1, color = "grey") +
  geom_hline(yintercept = 500, size=0.1, color = "grey") +
  geom_hline(yintercept = 1000, size=0.1, color = "grey") +
  geom_hline(yintercept = 1500, size=0.1, color = "grey") +
  geom_bar(stat="identity", position=position_dodge()) +
  scale_fill_manual(values=year_palette, name = "YEAR") +
  geom_text(aes(label = sum_number), hjust=-0.1, size = 3, angle=90,
            position = position_dodge(0.9)) +
  theme_tufte(base_size = 15) +
  theme(
    panel.background = element_blank(),
    plot.title = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.ticks = element_blank()
    #plot.margin = unit(c(1, 5, 1, 1), "lines")
  ) + 
  theme(legend.position="top") + 
  guides(colour = guide_legend(nrow = 1))


girafe(ggobj = v1, width_svg = 16, height_svg = 9, options =
list(opts_sizing(rescale = TRUE, width = 1.0)))

Although the “Long format” is needed to use with visual library, it is not really user friendly. Let’s use the “Wide format” to make it easier for users to review the data. We can do this by

Pivot

The above “Long data format” may be good for running a chart, but it is not the most human-friendly presentation. Let’s see how we can transform this.

df_3 <- df_2 %>% 
  pivot_wider(names_from = `Reported Month`, values_from = sum_number) %>% 
  relocate(`Reported Year`,Jan , Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)

df_3

Unpivot

Let’s unpivot this data to return to “Long data format”.

df_4 <- df_3  %>%
  pivot_longer(!`Reported Year`,names_to = "MONTH", values_to = "TOTAL_DEAD")
df_4

References

citations for narrative and data sources

IOM, Missing Migrants, GO

@misc{missingmigrants_2001_missing,
  author = {MissingMigrants},
  month = {06},
  title = {Missing Migrants Project},
  url = {https://missingmigrants.iom.int/},
  urldate = {2021-06-08},
  year = {2001},
  organization = {Iom.int}
}